2d to 3d Video Conversion with Static Scene and Horizontal Camera Motion

نویسندگان

Ling-Wei Lee

Tsuhan Chen

چکیده

Master of Electrical Engineering Program Cornell University Design Project Report Project Title: 2D to 3D Video Conversion with Static Scene and Horizontal Camera Motion Author: Ling-Wei Lee Abstract: Four automate methods to convert two-dimensional (2D) video to three-dimensional (3D) for video with only static scene and horizontal camera motion have been proposed in this project. Our approaches try to generate the paired view for each frame in the video using the pixels in the other frames. In two of the methods, we use structure from motion to calculate the camera centers for all the frames and where their paired view suppose to be, and synthesize those frames by interpolation using the two real frames next to them. We also compare these methods with the other two naïve approaches that choosing the constant N frame after each frame to be its pair or picking the real frame that is closet to the virtual pair to be the paired view. The methods that synthesize the paired view by interpolation output promising result under our assumption of static scene and horizontal camera motion. Four automate methods to convert two-dimensional (2D) video to three-dimensional (3D) for video with only static scene and horizontal camera motion have been proposed in this project. Our approaches try to generate the paired view for each frame in the video using the pixels in the other frames. In two of the methods, we use structure from motion to calculate the camera centers for all the frames and where their paired view suppose to be, and synthesize those frames by interpolation using the two real frames next to them. We also compare these methods with the other two naïve approaches that choosing the constant N frame after each frame to be its pair or picking the real frame that is closet to the virtual pair to be the paired view. The methods that synthesize the paired view by interpolation output promising result under our assumption of static scene and horizontal camera motion. Report Approved by Project Advisor:____________________________Date: _______ I. INTRODUCTION A. Summary In recent years, consumer and film making industry have started to put more attention on 3D films and more 3D videos have been made using stereo cameras. However, most videos are still filmed in 2D, and making those 2D videos into 3D in the post processing stage often require retrieving the depth manually, which is costly and inefficient. Although stereovision and depth perception retrieving for images have been studied in the field of computer vision for a long time, there is still no standard method to automate 2D to 3D conversion process. In this project, we try to research on automate method to convert 2D films into 3D for videos with static scene and horizontal camera motion as an initial approach to the problem. We have explored four methods to solve the problem using classic computer vision technique such as structure from motion, optical flow, and block matching algorithm for motion estimation and depth retrieving. The basic idea behind all the methods in this project is to use the information in the other frames of the video to synthesize the paired view for each frame in the video so that in the output, each frame will contain two views constructed by the original and its synthesized pair for left and right eyes. We started this project by implementing the first method; in this method, we synthesize the paired view for each frame by copying the frame, which is a constant N after the target frame. One can image synthesizing the paired in this way should introduce lots of error in the output when the camera does not move in constant speed. However, this method is a base line to compare with the other methods in this project. In the second, we try to improve the first one by choosing the paired view based on location instead of time (constant Nth frame after). To do so, we use Bundler, a structure from motion algorithm, to estimate the camera parameters for each frame in the video, and therefore, we can calculate where the paired view for each frame supposed to be. With the location calculated, we will pick the frame in the video that is closest to the paired view to be the synthesized view. In the third method, we try to synthesize the paired view by interpolating the two frames that is closest to the calculated paired view location at the location of the paired view. We calculate the motion vector, by fixed sized block matching algorithm, between the two frames, and then weight the motion vector according to the relative distance between the paired view location and the real frames, and finally, warp one of the real frames using the motion vector. In the fourth method, instead of interpolating the paired view using motion vector, we compute the optical flow between the two frames. Although the scope of this project is restricted to only videos with static scene and horizontal camera motion, we believe that this project can be the first step for future researches on 2D to 3D video conversion methods. B. Design problem and issues The motivation of this project is to explore ways to convert 2D video of any kind to 3D since most of the videos are still filmed in 2D and 3D videos are more interesting to watch. One can imagine that this is not a trivial task, since we miss the information for the third dimension in 2D video. Theoretically, such algorithm should perform perfect conversion if it some how retrieves the depth information for each frame in the video. Although this kind of problem has been research in the field of computer vision for a long time, there is still no standard way to solve the problem for every kind of camera motion and scene, which is often complicated in the movies. However, the problem can be further divided in to different sub-problems, since different kinds of scene have different difficulty to solve the problem. Scenes with moving objects are hard to solve, since it is hard to predict the motion of the object in relative to the camera. On the other hand, static scene seems to be approachable, since we can estimate the depth by the camera motion. Complicate camera motion also makes the difficulty higher because, we might not have enough information from the video to retrieve depth or synthesize views. Therefore, with the time constrain of this project, the scope of this project is only going to be the video with static scene and horizontal camera motion. The main issues under these assumptions become that how to synthesize the paired view for each frame in the video so that the viewer has the depth perception and does not feel discontinuity while watching the video. The requirements of the system are as follows: • The algorithm should convert the 2D video of static scene and horizontal camera motion into 3D. • The viewer should feel that the output is similar to the 3D video filmed by stereo camera. • The output 3D video should represent content of the original 2D video without much distortion.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised object segmentation for 2D to 3D conversion

In this paper, we address the handling of independently moving objects (IMOs) in automatic 2D to stereoscopic 3D conversion systems based on structure-from-motion (SfM) techniques. Exploiting the different viewing positions of a moving camera, these techniques yield excellent 3D results for static scene objects. However, the independent motion of any foreground object requires a separate conver...

متن کامل

Motion Segmentation and Dense Reconstruction of Scenes Containing Moving Objects Observed by a Moving Camera

We investigate two fundamental issues in Computer Vision: 2D motion segmentation and 3D dense shape reconstruction of a dynamic scene observed from a moving camera. The scene contains multiple rigid objects moving in a static background, while the camera undergoes general 3D rotation and translation. Our goal is to segment the video frames into 2D motion regions and static background areas, and...

متن کامل

3D model-based frame interpolation for Distributed Video Coding

This paper addresses the problem of side information extraction in Distributed Video Coding (DVC), taking into account geometrical constraints available when a moving camera captures a video of a static scene. A 3D model-based DVC approach is first described. The decoder recovers a 3D model from appropriately chosen key frames using motion models from the structure-from-motion paradigm. The int...

متن کامل

Depth from Encoded Sliding Projections

We present a novel method for 3D shape acquisition, based on mobile structured light. Unlike classical structured light methods, in which a static projector illuminates the scene with dynamic illumination patterns, mobile structured light employs a moving projector translated at a constant velocity in the direction of the projector’s horizontal axis, emitting static or dynamic illumination. For...

متن کامل

Multiple View Video Streaming and 3d Scene Reconstruction for Traffic Surveillance

We present a 3D modeling system for traffic surveillance applications. The application contains several cameras positioned around a traffic scene. The video signals are compressed and streamed to a central client. At the central client a scene reconstruction is carried out, using all 2D camera views utilizing 3D camera calibration information. To combine all views into a 3D model, moving object...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

2d to 3d Video Conversion with Static Scene and Horizontal Camera Motion

نویسندگان

چکیده

منابع مشابه

Unsupervised object segmentation for 2D to 3D conversion

Motion Segmentation and Dense Reconstruction of Scenes Containing Moving Objects Observed by a Moving Camera

3D model-based frame interpolation for Distributed Video Coding

Depth from Encoded Sliding Projections

Multiple View Video Streaming and 3d Scene Reconstruction for Traffic Surveillance

عنوان ژورنال:

اشتراک گذاری